研究人员高度利用了原位同步加速器高能X射线粉末衍射(XRD)技术,可以分析功能设备(例如电池材料)或复杂样品环境中材料的晶体结构反应堆)。材料的原子结构可以通过其衍射模式以及详细的分析(例如Rietveld的细化)来识别,该分析表明测量的结构如何偏离理想结构(例如内部应力或缺陷)。对于原位实验,通常在不同条件下(例如绝热条件)在同一样本上收集一系列XRD图像,产生不同的物质状态,或者简单地作为时间的时间连续收集,以跟踪样品的变化超过化学或物理过程。原位实验通常与区域探测器一起进行,收集由理想粉末的衍射环组成的2D图像。根据材料的形式,人们可能会观察到除现实样本及其环境的典型Debye Scherrer环以外的其他特征,例如纹理或优选方向以及2D XRD图像中的单晶衍射点。在这项工作中,我们介绍了对机器学习方法的研究,以快速可靠地识别XRD图像中的单晶衍射点。在XRD图像整合过程中排除伪影的排除允许精确分析感兴趣的粉末衍射环。我们观察到,当用高度多样的数据集对较小的子集进行训练时,梯度提升方法可以始终如一地产生高精度的结果。与常规方法相比,该方法大大减少了识别和分离单晶斑所花费的时间。
translated by 谷歌翻译
Estimating the 6D pose of objects is one of the major fields in 3D computer vision. Since the promising outcomes from instance-level pose estimation, the research trends are heading towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB+P and Depth, 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large scale scenes with extensive viewpoint coverage, 5) Checkerboard-free environment throughout the entire scene. We also provide benchmark results of state-of-the-art category-level pose estimation networks.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text. We show how M-VADER enables the generation of images specified using combinations of image and text, and combinations of multiple images. Previously, a number of successful DM image generation algorithms have been introduced that make it possible to specify the output image using a text prompt. Inspired by the success of those models, and led by the notion that language was already developed to describe the elements of visual contexts that humans find most important, we introduce an embedding model closely related to a vision-language model. Specifically, we introduce the embedding model S-MAGMA: a 13 billion parameter multimodal decoder combining components from an autoregressive vision-language model MAGMA and biases finetuned for semantic search.
translated by 谷歌翻译
Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs (>70%) and significantly outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions (>65%). We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another.
translated by 谷歌翻译
Despite being responsible for state-of-the-art results in several computer vision and natural language processing tasks, neural networks have faced harsh criticism due to some of their current shortcomings. One of them is that neural networks are correlation machines prone to model biases within the data instead of focusing on actual useful causal relationships. This problem is particularly serious in application domains affected by aspects such as race, gender, and age. To prevent models from incurring on unfair decision-making, the AI community has concentrated efforts in correcting algorithmic biases, giving rise to the research area now widely known as fairness in AI. In this survey paper, we provide an in-depth overview of the main debiasing methods for fairness-aware neural networks in the context of vision and language research. We propose a novel taxonomy to better organize the literature on debiasing methods for fairness, and we discuss the current challenges, trends, and important future work directions for the interested researcher and practitioner.
translated by 谷歌翻译
在过去几年中,水下车辆操纵器系统(UVMS)变得越来越小,越来越小,在计划和控制系统时,考虑操纵器和车辆之间的耦合力变得越来越重要。但是,处理这些力的典型方法需要媒介物的精确流体动力模型,并在操纵器上使用低级扭矩控制,这两者在现场都很少见。因此,许多UVMS控制方法都是基于运动学的,无法固有地解释这些效果。我们的工作通过训练模拟UVMS数据上的复发性神经网络来弥合运动学控制与动态之间的差距,以根据系统以前的状态预测将来车辆的音高。运动学计划者和控制者可以使用此指标来合并动态知识,而无需计算昂贵的模型,从而提高了他们执行水下操纵任务的能力。
translated by 谷歌翻译
从具有高隐私要求的领域(例如医疗干预空间)获得的真实数据较低,并且收购在法律上很复杂。因此,这项工作提供了一种以医疗服装为例为医疗环境创建合成数据集的方法。目的是缩小合成数据和真实数据之间的现实差距。为此,使用虚幻的引擎插件或Unity比较了3D扫描服装和设计服装的方法。此外,还使用了绿屏和目标域数据集的混合现实数据集。我们的实验表明,设计服装的结构性域随机化以及混合现实数据提供了基线,可在临床目标域的测试数据集上实现72.0%的地图。当使用15%可用的目标域列车数据时,针对100%(660张图像)目标域列车数据的差距几乎可以关闭80.05%的地图(81.95%地图)。最后,我们表明,当使用100%目标域训练数据时,精度可以提高到83.35%的地图。
translated by 谷歌翻译
注释滥用语言很昂贵,在逻辑上复杂,并造成了心理伤害的风险。但是,大多数机器学习研究都优先提高有效性(即F1或精度得分),而不是数据效率(即,最小化注释的数据量)。在本文中,我们在两个数据集上使用模拟实验,以不同比例的滥用,以证明基于变形金刚的主动学习是一种有前途的方法,可以实质上提高效率,同时仍然保持高效,尤其是当虐待内容是数据集中较小比例的情况下。这种方法需要大量的标记数据,以达到与完整数据集培训相等的性能。
translated by 谷歌翻译
来自不同摄像头设备的光学相干断层扫描(OCT)成像会导致挑战域的变化,并可能导致机器学习模型的精度严重下降。在这项工作中,我们引入了基于单数值分解(SVDNA)的最小噪声适应方法,以克服视网膜OCT成像中三个不同设备制造商的目标域之间的域间隙。我们的方法利用噪声结构的差异成功地弥合了不同OCT设备之间的域间隙,并将样式从未标记的目标域图像转移到可用手动注释的源图像。我们演示了该方法尽管简单,但如何比较甚至胜过最先进的无监督域适应方法,用于在公共OCT数据集中进行语义细分。 SVDNA可以将仅几行代码集成到任何网络的增强管道中,这些网络与许多最新的域适应方法形成鲜明对比,这些方法通常需要更改基础模型体系结构或训练单独的样式转移模型。 SVDNA的完整代码实现可在https://github.com/valentinkoch/svdna上获得。
translated by 谷歌翻译